3,759 research outputs found

    Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

    Full text link
    Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks. Source code: https://github.com/vub-ai-lab/bdpi.Comment: Accepted at the European Conference on Machine Learning 2019 (ECML

    Preferred Basis in a Measurement Process

    Get PDF
    The effect of decoherence is analysed for a free particle, interacting with an environment via a dissipative coupling. The interaction between the particle and the environment occurs by a coupling of the position operator of the particle with the environmental degrees of freedom. By examining the exact solution of the density matrix equation one finds that the density matrix becomes completely diagonal in momentum with time while the position space density matrix remains nonlocal. This establishes the momentum basis as the emergent 'preferred basis' selected by the environment which is contrary to the general expectation that position should emerge as the preferred basis since the coupling with the environment is via the position coordinate.Comment: Standard REVTeX format, 10 pages of output. Accepted for publication in Phys. Rev

    Bandit Models of Human Behavior: Reward Processing in Mental Disorders

    Full text link
    Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1

    First-Stage Development of the Pitjantjatjara Translation of the World Health Organization’s Alcohol, Smoking and Substance Involvement Screening Test (ASSIST)

    Get PDF
    Substance use is a leading contributor to global disease, illness and death. Compared with non-Indigenous Australians, Aboriginal and Torres Strait Islander Australians are at an increased risk of substance-related harms due to the experience of additional social, cultural, and economic factors. While preventive approaches, including screening and early interventions are promising, currently there are limited options available to healthcare workers that are culturally appropriate for use in Aboriginal and Torres Strait Islander populations. Therefore, the aim of this research was to translate and culturally adapt the World Health Organization endorsed, Alcohol, Smoking and Substance Involvement Screening Test (ASSIST) into Pitjantjatjara. This paper first describes the process of translation and adaptation of the instrument (Phase 1). The process of focus-group testing the translated instrument for accuracy and cultural appropriateness is also discussed (Phase 2). Key findings from both phases are presented in the context of how the research team worked with key stakeholders in the community to identify facilitators and work through barriers to implementation. The findings from this paper will be used to inform the development of a digital, app-based version of the instrument for the purposes of pilot-testing and validation

    Scorpion incidents, misidentification cases and possible implications for the final interpretation of results

    Full text link

    A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

    Full text link
    With the availability of big medical image data, the selection of an adequate training set is becoming more important to address the heterogeneity of different datasets. Simply including all the data does not only incur high processing costs but can even harm the prediction. We formulate the smart and efficient selection of a training dataset from big medical image data as a multi-armed bandit problem, solved by Thompson sampling. Our method assumes that image features are not available at the time of the selection of the samples, and therefore relies only on meta information associated with the images. Our strategy simultaneously exploits data sources with high chances of yielding useful samples and explores new data regions. For our evaluation, we focus on the application of estimating the age from a brain MRI. Our results on 7,250 subjects from 10 datasets show that our approach leads to higher accuracy while only requiring a fraction of the training data.Comment: MICCAI 2017 Proceeding

    Production of Androgens by Microbial Transformation of Progesterone in Vitro: A Model for Androgen Production in Rivers Receiving Paper Mill Effluent

    Get PDF
    We have previously documented the presence of progesterone and androstenedione in the water column and bottom sediments of the Fenholloway River, Taylor County, Florida. This river receives paper mill effluent and contains masculinized female mosquitofish. We hypothesized that plant sterols (e.g., β-sitosterol) derived from the pulping of pine trees are transformed by bacteria into progesterone and subsequently into 17α-hydroxyprogesterone, androstenedione, and other androgens. In this study, we demonstrate that these same androgens can be produced in vitro from the bacterium Mycobacterium smegmatis. In a second part to this study, we reextracted and reanalyzed the sediment from the Fenholloway River and verified the presence of androstadienedione, a Δ1 steroid with androgen activity

    Examining Muscle Activity Differences During Single and Dual Vector Elastic Resistance Exercises

    Get PDF
    # Background Elastic resistance exercise is a common part of rehabilitation programs. While these exercises are highly prevalent, little information exists on how adding an additional resistance vector with a different direction from the primary vector alters muscle activity of the upper extremity. # Purpose The purpose of this study was to examine the effects of dual vector exercises on torso and upper extremity muscle activity in comparison to traditional single vector techniques. # Study Design Repeated measures design. # Methods Sixteen healthy university-aged males completed four common shoulder exercises against elastic resistance (abduction, flexion, internal rotation, external rotation) while using a single or dual elastic vector at a fixed cadence and standardized elastic elongation. Surface electromyography was collected from 16 muscles of the right upper extremity. Mean, peak and integrated activity were extracted from linear enveloped and normalized data and a 2-way repeated measures ANOVA examined differences between conditions. # Results All independent variables differentially influenced activation. Interactions between single/dual vectors and exercise type affected mean activation in 11/16 muscles, while interactions in peak activation existed in 7/16 muscles. Adding a secondary vector increased activation predominantly in flexion or abduction exercises; little changes existed when adding a second vector in internal and external rotation exercises. The dual vector exercise in abduction significantly increased mean activation in lower trapezius by 25.6 ± 8.11 %MVC and peak activation in supraspinatus by 29.4 ± 5.94 %MVC (p<0.01). Interactions between single/dual vectors and exercise type affected integrated electromyography for most muscles; the majority of these muscles had the highest integrated electromyography in the dual vector abduction condition. # Conclusion Muscle activity often increased with a second resistance vector added; however, the magnitude was exercise-dependent. The majority of these changes existed in the flexion and abduction exercises, with little differences in the internal or external rotation exercises. # Level of Evidence 3
    corecore